{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 助教哥你好呀~\n", "\n", "### 这个文件是可以**直接运行**的加载模型的代码\n", "\n", "-----\n", "\n", "*因为中间有一些处理数据的过程, 整个文件运行时间大概在十分钟, 用jupyter notebook打开可以看到我最后一次运行与输出的结果*\n", "\n", "##### 助教哥辛苦了" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "E:\\Anaconda3\\lib\\site-packages\\h5py\\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n", " from ._conv import register_converters as _register_converters\n", "Using TensorFlow backend.\n" ] } ], "source": [ "from keras.models import load_model\n", "\n", "model = load_model('最高分的训练好的模型.h5')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**接下来读取预处理好的测试数据**" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "

	text	class	positive
index
0	18年结婚哈哈哈	0	0.900696
1	2017最后顿大餐吃完两人世界明年就是三个人一起啦许下生日愿望️希望一家人都能顺利平安健康🏻🏻🏻	1	0.999904
2	意盎然的季节！祝愿大家都生机勃勃，郁郁葱葱！	2	0.736431
3	2017 遇见挚友遇见我老公结了婚有了小芒果希望2018也超级美好️	3	0.983905
4	2018.1.1	4	0.500000
5	2018加油！	5	0.895319
6	2018年做一个更加真实的自己。️	3	0.783433
7	2018年的第一天，完美的错过了一辆公交车。德州	6	0.934181
8	2018年目标1.赚钱买房2.谈场恋爱，遇到对的人就结婚3.拥有一副健康的身体4.学会一种乐...	7	0.999799
9	2018年第一个假期：元旦，就这么过去了，感冒咳嗽发高烧给这个元旦带来了不一样的节日，好快呀...	8	0.733896

\n", "

" ], "text/plain": [ " text class positive\n", "index \n", "0 18年结婚哈哈哈 0 0.900696\n", "1 2017最后顿大餐吃完两人世界明年就是三个人一起啦许下生日愿望️希望一家人都能顺利平安健康🏻🏻🏻 1 0.999904\n", "2 意盎然的季节！祝愿大家都生机勃勃，郁郁葱葱！ 2 0.736431\n", "3 2017 遇见挚友遇见我老公结了婚有了小芒果希望2018也超级美好️ 3 0.983905\n", "4 2018.1.1 4 0.500000\n", "5 2018加油！ 5 0.895319\n", "6 2018年做一个更加真实的自己。️ 3 0.783433\n", "7 2018年的第一天，完美的错过了一辆公交车。德州 6 0.934181\n", "8 2018年目标1.赚钱买房2.谈场恋爱，遇到对的人就结婚3.拥有一副健康的身体4.学会一种乐... 7 0.999799\n", "9 2018年第一个假期：元旦，就这么过去了，感冒咳嗽发高烧给这个元旦带来了不一样的节日，好快呀... 8 0.733896" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import jieba\n", "\n", "dff = pd.read_csv(\"./Preprocessed_data/train.csv\",index_col=0)\n", "dff['text'] = dff['text'].fillna('')\n", "dff.head(10)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "

	text	class	positive
index
0	我是正面哦	0	0.347826
1	爱是恒久忍耐，又有恩慈。爱是不嫉妒，不自夸，不张狂，不轻易发怒。不计算人的恶。凡事包容。凡事...	0	0.496333
2	讨厌死了，上班上班上班不停的上班我真的超级累。什么都不干还是超级超级累。	0	0.000422
3	矮马大半夜的放肌肉男不让人睡觉了	0	0.409895
4	谢谢陈先生。	0	0.768959
5	我的2016要早点睡别熬夜	0	0.625607
6	周锐锐哥！爱你	0	0.970187
7	塞尼亚岛	0	0.500000
8	只可惜没能去现场	0	0.100791
9	自从发现这个号都处于一种忍不住不看看了睡不着的状态	0	0.355194

\n", "

" ], "text/plain": [ " text class positive\n", "index \n", "0 我是正面哦 0 0.347826\n", "1 爱是恒久忍耐，又有恩慈。爱是不嫉妒，不自夸，不张狂，不轻易发怒。不计算人的恶。凡事包容。凡事... 0 0.496333\n", "2 讨厌死了，上班上班上班不停的上班我真的超级累。什么都不干还是超级超级累。 0 0.000422\n", "3 矮马大半夜的放肌肉男不让人睡觉了 0 0.409895\n", "4 谢谢陈先生。 0 0.768959\n", "5 我的2016要早点睡别熬夜 0 0.625607\n", "6 周锐锐哥！爱你 0 0.970187\n", "7 塞尼亚岛 0 0.500000\n", "8 只可惜没能去现场 0 0.100791\n", "9 自从发现这个号都处于一种忍不住不看看了睡不着的状态 0 0.355194" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfTest = pd.read_csv(\"./Preprocessed_data/test.csv\",index_col=0)\n", "dfTest['text'] = dfTest['text'].fillna('')\n", "dfTest.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**还有一点处理, 很快了**" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Building prefix dict from the default dictionary ...\n", "Loading model from cache C:\\Users\\Kai\\AppData\\Local\\Temp\\jieba.cache\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "0\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Loading model cost 0.810 seconds.\n", "Prefix dict has been built succesfully.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "100000\n", "200000\n", "300000\n", "400000\n", "500000\n", "600000\n", "700000\n", "800000\n", "0\n", "100000\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "E:\\Anaconda3\\lib\\site-packages\\keras_preprocessing\\text.py:178: UserWarning: The `nb_words` argument in `Tokenizer` has been renamed `num_words`.\n", " warnings.warn('The `nb_words` argument in `Tokenizer` '\n" ] } ], "source": [ "def stopwordslist():\n", " f = open(\"./Preprocessed_data/stop.txt\", \"r\")\n", " line = f.readline()\n", " stopwords = []\n", " index = 0\n", " while line:\n", " index += 1\n", " line = line.replace('\\n', '')\n", " line = line.replace('[', '')\n", " line = line.replace(']', '')\n", " line = line.replace('］', '')\n", " line = line.replace('［', '')\n", " \n", " stopwords.append(line)\n", " line = f.readline()\n", "\n", " return stopwords\n", "\n", "stopwords = stopwordslist()\n", "\n", "def seg_depart(sentence):\n", " sentence_depart = jieba.cut(sentence.strip())\n", " outstr = ''\n", " for word in sentence_depart:\n", " if word not in stopwords:\n", " if word != '\\t':\n", " outstr += word\n", " outstr += \" \"\n", " return outstr\n", "\n", "sen = dff['text'].values\n", "\n", "for i in range(len(sen)):\n", " if i % 100000 == 0:\n", " print(i)\n", " sen[i] = seg_depart(sen[i])\n", " \n", "\n", "senTest = dfTest['text'].values\n", "\n", "for i in range(len(senTest)):\n", " if i % 100000 == 0:\n", " print(i)\n", " senTest[i] = seg_depart(senTest[i])\n", " \n", "\n", "from keras.preprocessing.text import Tokenizer\n", "from keras.preprocessing.sequence import pad_sequences\n", "\n", "MAX_NB_WORDS = 20000\n", "tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**上面的输出是运行进度的一些信息, 上面的cell大概需要运行五分钟**\n", "\n", "*很快就好啦*" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "tokenizer.fit_on_texts(sen)\n", "sequences_test = tokenizer.texts_to_sequences(senTest)\n", "MAX_SEQUENCE_LENGTH = 300\n", "\n", "x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "50000\n", "100000\n", "150000\n" ] } ], "source": [ "import numpy as np\n", "import csv\n", "\n", "pred = model.predict(x_test)\n", "result = np.argmax(pred, axis = 1)\n", "\n", "# 写入文件\n", "csvFile = open('FORCheckResult.csv','w', newline='', encoding='UTF-8') # 设置newline，否则两行之间会空一行\n", "writer = csv.writer(csvFile)\n", "\n", "writer.writerow(['ID', 'Expected'])\n", "for i in range(len(result)):\n", " if i % 50000 == 0:\n", " print(i)\n", " writer.writerow([int(i), int(result[i])])\n", " \n", "csvFile.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 最高分的训练好的模型.h5 预测的 test.data 已经被输出到当前文件夹下的 FORCheckResult.csv 啦\n", "\n", "\n", "\n", "#### 辛苦了 {心}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }